-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Modify recon storage tables, trait and sqlite config to improve throughput #243
Conversation
insert and insert_many have been modified to perform better in sqlite. We grab a transaction and do all our writes, quitting early if we've done anything before. This has appeared to about 2x throughput from the shared pool and multiple write operations we had before
synchronous = normal is typically sufficient in WAL journal mode, and demonstrates a 2x increase in writes in my benchmarks
It is a table scan. In the synchronized state two nodes will be doing a hash_range query for an entire model range (i.e. a single interest). If they both get the same hash then they are synchronized and they stop the synchronization process. This means we often do hash_range queries over large ranges which require that we do the sum for that entire range. I would expect this query to grow as datasize grows. (We have some medium term) plans for how to address this but for now hopefully we can kept it fast enough.
FWIW the biased select usage is intentional to avoid potential deadlocks in the protocol. Happy to explore its usage more in depth.
I discovered an easy optimization we can make to the middle query last time I was in the code just didn't have to the time to add it. In fact we may not even need middle anymore. The indent is that we use middle to split the key space. It does this by counting the rows. However hash_range also counts the rows now and we always run hash_range right before we we run middle. So we can just reuse that count to split instead of calling middle. I'll create an issue for this so we can track it. Still reviewing the code... Thanks for the in depth analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks great
Yep, makes sense. I looked more closely and figured out what it's doing, and it clicked seeing the min/max fenceposts being used. Since it should be associative and we don't support deletion, we can probably create a new table that's updated by a trigger that keeps the hash (total, over some ranges, etc). Think this is worth a discussion.
Yeah, that makes sense. I realized that was probably the reason. What I was seeing and want to figure out, is why it can exchange 1000+ keys/sec, but it's staying below the write speed when it's steady. I was assuming it might be the loop and message exchange frequency, but it could also be key discovery and when there are huge differences it shrinks faster.
Awesome, we can discuss this with the hash_range optimizations and tackle in a follow up.
|
we need to figure out how to make these deterministic. currently they exact order of exchange depends on the speed at which the other side takes certain actions. speeding up the inserts appears to have caused a resp before a second req in some cases.
febf9a4
to
88aac7b
Compare
Linear WS1-1432.
Before making any changes, I was seeing about 400/100 rps keys added/synced. After these changes, we're able to push the incoming over 1k, and a steady workload targeting 300 completed with ~310/240 keys exchanged/sec.
Changes:
recon_value
. This doesn't have any real benefit up front, but as the table grows it will help. Eventually, a single table will pay during reads as scans will load a lot more data. The value is about 10x bigger and will explode row sizes: a key is ~80 bytes and the value is ~1KB. The initial change moved writes up and syncs down: 450/65.pragma synchronous = NORMAL
had a huge impact after the other changes. When I tried it originally, it made zero noticeable difference. After the batched write/transaction usage, it sped up another 80% to 1120/150.Follow ups:
hash_range
query (or rather, the indexes/table layout), after a certain point it begins it increase in duration linearly (seems like it's potentially a table scan as we add more rows).BEGIN IMMEDIATE
for transactions would be faster (not in sqlx currently, so would likely require a wrapper to make sure drop is called/things are rolled back, or updating to a single statement).mmap_size
ortemp_store = memory
might be worth changing, and possibly other pragma options.select!(biased)
usage). It's not necessarily a bad thing, but we may want to look at it.Steady simulation results:
I ran a simulation with the following spec and it almost hit 300 rps on both sides. One side was 280314 over 15m for 311 rps. The sync target was 217945 new keys for a rate of 242 rps. It took about 5 minutes for the recv to catch up, but it ended at 380313 keys (one short) 🤔 This seems like it may be a result of the hanging hash_range queries.
The SQL queries were quite consistent throughout and much more performant. Before these changes, the best I was seeing was 1.5m on inserts and upwards of 2.5m depending on the query setup. Here things stay about 0.7m on the high end, while most are much faster.
Difference between keys on both sides (logarithmic)
Simulation sync rate
Stress test results:
When adding keys much faster to one node, we see the
middle
andhash_range
queries on the write side taking the longest. This is 5x what we were seeing above, but they do level out pretty quickly. The second spike resulted in a substantial dip in sync rate. Unfortunately, the result was 1008588 keys on one side and only 883593 on the other after 30 minutes and nothing was happening and thehash_range
duration kept increasing (over 1s queries in the logs).Botched metrics:
Unfortunately, all of the values below are inflated due to counting value updates (existing key) as a new key. It is interesting compared to the corrected results, as it demonstrates how syncing the values for keys we already have is what is consuming most of the remaining distance between the new keys on the writer and our side. One side was 280395 req in 15 minutes for 311 rps and the other was 282165 (counting keys and new_keys || values), which shows the key + value follow up are going faster than the incoming requests, but just getting keys is not.